Main
David Zhang
By bridging bioinformatics and engineering, I translate genetic and transcriptomic data into software that delivers real-world impact. With experience across the full software development lifecycle, I design, build, and deploy tools to solve bioinformatic problems — from prototyping innovative solutions to implementing and maintaining robust, production-ready pipelines.
Work Experience
Senior bioinformatics engineer
London, UK (hybrid)
Present - 2024
- Optimise and scale machine learning tools for single-cell perturb-seq data comprising millions of cells. Apply these tools to generate actionable insights and inform strategic decisions around company direction.
- Design and deploy a data pipeline to ingest, tidy and version-control data for the CoSyne knowledge graph. Automate the release of the graph to AWS using terraform and CI/CD, improving the efficiency and traceability of data updates.
- Build and maintain infrastructure tooling including docker images, terraform modules, CI/CD workflows and cruft templates to streamline bioinformatics analyses.
Senior bioinformatics software engineer
Hinxton, UK (hybrid)
2024 - 2022
- Developed scalable nextflow pipelines to process solid tumor DNA-sequencing data covering alignment, variant calling, driver mutation annotation, and therapy matching.
- Built python and R packages to improve the efficiency of clinical verification, reducing time taken by 2 weeks per quarterly release.
Bioinformatician internship (2 months)
London, UK (remote)
2021
- Created a reproducible aberrant splicing detection pipeline using docker for drug target discovery in C9orf72 ALS patients.
Education
PhD, Bioinformatics
University College London
London, UK
2022 - 2017
- Analysed bulk RNA-sequencing data with the aim of improving the diagnosis rate of rare disease patients. Focussed on detection of abberant splicing events as a strategy to prioritise pathogenic variants.
- Released R/Bioconductor packages that enable bioinformatics analyses and interpretation. Championed best practices for software development through teaching workshops and courses.
MSc, Neuroscience
University College London
London, UK
2016 - 2015
- Grade: Merit (68%)
BSc, Biomedical science
University College London
London, UK
2015 - 2012
- Grade: 2:1 (69%)
Open-source software
Web development
N/A
N/A
Present - 2022
- Portfolio website: Showcases my favourite open-source contributions. Built with Django and deployed using PythonAnywhere.
Python packages
N/A
N/A
2023 - 2021
- autogroceries: Use Selenium to automate your grocery shop.
- stravaboard: An extendable Streamlit dashboard for tracking Strava runs.
R packages
N/A
N/A
2022 - 2020
- ggtranscript: Visualising transcript structure and annotation using ggplot2.
- dasper: Detection of aberrant splicing events in RNA-sequencing data.
Selected Publications
A complete list of my publications is available via Google Scholar
ggtranscript: an R package for the visualization and interpretation of transcript isoforms using ggplot2
Bioinformatics
N/A
2022
- Role: Co-first author
Developmental Consequences of Defective ATG7-Mediated Autophagy in Humans
The New England Journal of Medicine
N/A
2021
- Role: Analyst
Megadepth: efficient coverage quantification for BigWigs and BAMs
Bioinformatics
N/A
2021
- Role: R package developer
Incomplete annotation of disease-associated genes is limiting our understanding of Mendelian and complex neurogenetic disorders
Science advances
N/A
2020
- Role: First author